04 June 2024

Transformations

Introduction

GigaSpaces Data Integration The Data Integration (DI) layer is a vital part of the Digital Integration Hub (DIH) platform. It is responsible for a wide range of data integration tasks such as ingesting data in batches or streaming data changes. This is performed in real-time from various sources and systems of record (SOR. The data then resides in the In-Memory Data Grid (IMDG), or Space, of the GigaSpaces Smart DIH platform. feature will now include transformations. Configuring transformations enables data to be converted into a simple format that can then be used for the easy creation of services.

Acting as a conduit for data, data pipelines enable efficient processing, transformation, and delivery of data to the desired location.

Through data cleansing and transformation processes, data pipelines enhance data quality and ensure accuracy for analysis and decision-making.

Implementation

Low-code transformation can be created and updated through our SpaceDeck GigaSpaces intuitive, streamlined user interface to set up, manage and control their environment. Using SpaceDeck, users can define the tools to bring legacy System of Record (SoR) databases into the in-memory data grid that is the core of the GigaSpaces system. UI. Refer to the SpaceDeck – Configuring Transformations page for more details.

Phase 1 Transformations

Initially, GigaSpaces will be offering the transformations listed below. The scope will be increased in future versions.

Calculated column - Adding a column, calculation of VAT, mathematical operations, string operations, date operations, etc.
FILTER (aka WHERE) clause
Calling external REST API REpresentational State Transfer. Application Programming Interface An API, or application programming interface, is a set of rules that define how applications or devices can connect to and communicate with each other. A REST API is an API that conforms to the design principles of the REST, or representational state transfer architectural style. function - Calculation of insurance risk based on captured transaction record

Even though there is no limit to the number of transformations that can be configured, using an excessive amount could affect performance.

Apache Flink SQL Function Support

GigaSpaces supports the following Flink Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink executes arbitrary dataflow programs in a data-parallel and pipelined manner. SQL built-in functions for data transformations:

User Flow

The flow below, assumes the user has already created a Space Where GigaSpaces data is stored. It is the logical cache that holds data objects in memory and might also hold them in layered in tiering. Data is hosted from multiple SoRs, consolidated as a unified data model. and assigned a Data Source.